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Abstract 



In this paper, a compressed membership problem for finite automata, both deterministic 
and non-deterministic, with compressed transition labels is studied. The compression is rep- 
resented by straight-line programs (SLPs), i.e. context-free grammars generating exactly one 
string. A novel technique of dealing with SLPs is introduced: the SLPs are recompressed, so 
, that substrings of the input text are encoded in SLPs labelling the transitions of the NFA 

pL| ■ (DFA) in the same way, as in the SLP representing the input text. To this end, the SLPs are 

' locally decompressed and then recompressed in a uniform way. Furthermore, such recompres- 

O . sion induces only small changes in the automaton, in particular, the size of the automaton 

remains polynomial. 

Using this technique it is shown that the compressed membership for NFA with compressed 
labels is in NP, thus confirming the conjecture of Plandowski and Rytter [18] and extending 
qq ' the partial result of Lohrey and Mathissen [T3; as it is already known, that this problem is 

NP-hard, we settle its exact computational complexity. Moreover, the same technique applied 
(f) ■ to the compressed membership for DFA with compressed labels yields that this problem is in 

^SJ ' P; for this problem, only trivial upper-bound PSPACE was known. 

1 Introduction 

1.1 Compression and Straight-Line Programms 

Due to ever-increasing amount of data, compression methods are widely applied in order to de- 
crease the data's size. Still, the stored data sometimes need to be processed and decompressing 
it on each occasion is wasteful, Thus there is a large demand for algorithms working directly on 
the compressed data. Such task is not as desperate, as it may seem: it is a popular outlook, that 
compression basically extracts the hidden structure of the text and if the compression rate is high, 
the text must have a lot of internal structure. And it is natural to assume, that such a structure 
will help devising methods dealing directly with the compressed representation of the data. In- 
deed, efficient algorithms for fundamental text operations (pattern matching, checking equality, 
etc.) are known for various practically used compression methods (LZ, LZW, etc.) [21 EE]. 

When devising algorithms for compressed data, quite early one needs to focus on the exact 
compression method, to which the algorithm is applied. The most practical, and challenging, 
choice is one of the widely used standards, like LZW or LZ. However, a different approach was 
also proposed: for some applications and for most of theory-oriented considerations it would be 
useful to model one of the practical compression standard by a more mathematically well-founded 
method. This idea, among other, lay at the foundations of the notion of Straight-Line Programms 
(SLP), whose instance can be simply seen as context-free grammars generating exactly one string. 
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SLPs are the most popular theoretical model of compression. This is on one hand motivated 
by a simple, 'clean' and appealing definition, on the other hand, they model the LZ compression 
standard: each LZ compressed text can be converted into an equivalent SLP with only logarithmic 
overhead (and in polynomial time) while each SLP can be converted to an equivalent SLP with 
just a constant overhead (an in polynomial time). 

The approach of modelling compression by SLP in order to develop efficient algorithms turned 
out to be fruitful. Algorithmic problems for SLP-compressed input strings were considered and 
successfully solved [HI E3 UH] • The recent state-of-the-art efficient algorithms for pattern matching 
in LZ and LZW compressed text essentially use the reformulation of LZW and LZ methods in 
terms of SLPs [2J [3] . SLPs found their usage also in programme verification [H [S] . Attempts were 
made in order to make out of the box tools for texts represented by SLPs, e.g., efficient indexing 
structure for SLPs was recently developed [T]. 

Surprisingly, while SLPs were introduced mainly as a model for practical applications, they 
turned out to be useful also in strictly theoretical branches of computer science, for instance, their 
usage was important in the famous proof of Plandowski, that satisfiability of word equations is in 

PSPACE nn. 

1.2 Membership problem 

As it should be already clear, that SLPs are used, both in theoretical, and applied research in 
computer science. Hence, tools for them should be developed. In particular, one should be aware, 
that whenever working with strings, these strings may be supplied as respective SLPs. Hence, 
all the usual string problems should be reinvestigated in the compressed setting, as the classical 
algorithms may not apply directly, be inefficient or worse, some of these problems may become 
computationally difficult. 

From language theory point of view, the crucial questions stated in terms of strings, is the one of 
compressed string recognition. To be more precise, we consider classic membership problems, i.e. 
recognition by automata, generation by a grammar etc., in which the input is supplied as an SLP. 
We refer to such problems as compressed membership problems. These were first studied in the 
pioneering work of Plandowski and Rytter [18] , who considered compressed membership problem 
for various formalism for defining languages. Already in this work it was observed, that we should 
precisely specify, what part of the input is compressed? Clearly the input string, but what about 
the language representation (i.e. regular expression, automaton, grammar, etc.). Should it be also 
compressed or not? Both variant of the problem are usually considered, with the following naming 
convention: when only the input string is compressed, we use a name compressed membership, 
when also the language representation, we prepend fully to the name. 

In years to come, the compressed membership problem was investigated for various language 
classes EJ [TTJ [T21 [TH] . Compressed word problem for groups and monoids [TTJ [TTJ [T5] , which 
can be seen as a generalisation of membership problem, was also investigated. 

Despite the large attention in the research community, the exact computational complexity of 
some problems remain open. The most notorious of those is the fully compressed membership 
problem for NFA, considered already in the work of Plandowski and Rytter [TS]. Here, the com- 
pression of NFA is done by allowing the transitions by strings, and only by single letters, and 
representing these strings as SLPs. 

It is relatively easy to observe that the compressed membership problem for the NFA is in P, 
however, the status of the fully compressed variant remained open for a long time. Some partial 
results were obtained by Plandowski and Rytter [18] . who observed that it is in PSPACE and is 
NP-hard for the case of unary alphabet, both of these bounds being relatively simple. Moreover, 
they showed that this problem is in NP for some particular cases, for instance, in the case of 
one-letter alphabet. Further work on the problem was done by Lohrey and Mathissen [T3], who 
demonstrated that if the strings defined by SLP have polynomial periods, the problem is in NP, 
and when all strings are highly aperiodic, it is in P. 
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1.3 Our results and techniques 

We establish the computational complexity of fully compressed membership problems for both 
NFAs and DFAs. 

Theorem 1. Fully compressed membership problem for NFA is in NP, for DFA it is in P. 

Our approach to the problem is essentially different than the approach of Plandowski and 
Rytter [TB] and Lohrey and Mathissen [13]. The main ideas utilised in the previous approaches 
focused on the properties of strings described by SLPs. We take a completely different approach: 
we analyse and change the way strings are described by the SLPs in instance. That is, we focus 
on the SLPs, and not on the encoded strings. Roughly speaking, our algorithm aims at having 
all the strings in the instance compressed 'in the same way'. To achieve this goal, we decompress 
the SLPs. Since the compressed text can be exponentially long, we do this locally: we introduce 
explicit strings into the right-hand sides of the productions. Then, we recompress these explicit 
strings uniformly. Since such pieces of text are compressed in the same way, we can 'forget' 
about the original substrings of the input and treat the introduced nonterminals as atomic letters. 
The idea is that such recompression should shorten the text significantly: roughly one 'round' of 
recompression, in which every pair of letters that was present at the beginning of the 'round' is 
compressed, should shorten the encoded strings by a constant factor. It remains to implement the 
recompression (and changes in the NFA) in npolytime, while keeping the size of N polynomial. 

We stress, that the idea of local decompression and synchronous recompression of SLP is new 
and promising: there is hope that it can be applied to other problem related to SLPs. 

2 Preliminaries 

2.1 Straight line programmes 

Formally a Straight line programme (SLP) is context free grammar G over the alphabet S, with 
a language consisting of exactly one string. Usually it is assumed that G is in a Chomsky normal 
form, i.e. each production is either of the form X — > Y Z or X — > a. This assumption in particular 
implies, that strings defined by nonterminals have length at most 2 n ; since our algorithm will 
replace some substrings by shorter ones, none string defined by SLPs during the run of algorithm 
will exceed this length. 

We denote the string defined by nonterminal A by word(A). This notion extends to word(a) 
for a £ (X U £)* in the usual way. 

2.2 Input 

The instance of the fully compressed membership problem for NFA consists of an input string, 
represented by an SLP, and an NFA N, whose transitions may be labelled by SLPs. 

For our purposes it is more convenient to assume, that all SLPs are given as a single context 
free grammar G with a set of nonterminals X — {X±, . . . , X n }, the input string is defined by X n 
and the NFAs transitions are labelled with nonterminals of G. Furthermore, in our proof, it is 
essential to drop the usual assumption, that G is in a Chomsky normal form. However, we still 
impose some conditions on the productions' right-hand sides. Thus, we require that the grammar 
during the run of algorithm satisfies the following constraints on its form: 

each nonterminal has exactly one production, which is of the form (fa) 
Xi — > uXjvXk or Xi — > uXjV or — > u, where u, v £ S* and j, k < i, (lb) 
if word(Xi) = e then Xi is not on the right-hand side of any production. (lc) 

The strings u, v and their substrings appear explicitly in a rule, this notion is introduced to 
distinguish them from the substrings of word(X^). Notice, that |T]) does not exclude the case, 
when Xi — > e and allowing such a possibility streamlines the analysis. 
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Without loss of generality we may assume that the input string starts and ends with designated, 
unique symbols, denoted as $ and #. These are not essential, however, the first and last letter 
of word(A"„) need to be treated in a somewhat special manner, furthermore, this applies to their 
appearances in the NFA as well. Having special symbols for the first and last letter makes the 
analysis smoother. 

2.3 Input size, complexity classes 

The size \G\ of the representation of grammar G is the sum of length of the right-hand sides of 
G's rules. The size \N\ of the representation of NFA N is the sum of number of its states and 
transitions. The size £| of alphabet £ is simply the number of elements in £. 

The input (or, in general, current instance) size is polynomial in N, G, £ and n, which denotes 
the number of nonterminals in G. We point out, that one of the crucial properties of our algorithm 
is that n is not modified during the whole run of the algorithm. 

By npolytime (polytime) we denote the class of algorithms running in non-deterministic (deter- 
ministic, respectively) polynomial time, and by NP (P, respectively) the corresponding complexity 
classes of the decision problems. 

2.4 Known results 

We use the following basic result, which states that the fully compressed membership problem, 
when the input string is over a unary alphabet, is in NP for NFA and in P for DFA. 

Lemma 1 (cf. [181 Theorem 5]). The fully compressed membership problem restricted to the input 
string over an alphabet E = {a} is in NP for NFA and in P for DFA. 

The first claim can be easily inferred from the result of Plandowski and Rytter [T51 Theorem 5] , 
who proved that this problem is in NP, when E = {a}, i.e. also transitions in the NFA are labelled 
by powers of a only. The second claim is trivial. 

2.5 Path and labels in NFA 

Since we deal with automata, proofs will be concerned with (accepting) paths for strings. We shall 
consider NFAs, for which transitions are labelled with either letters, or non-terminals of G. That 
is, that transition relation 8 satisfies 8 C Q x (E U X) x Q. Consequently, a path V from state p\ 
to Pk+i is a sequence a\a2 . . ■ ctk, where on G E U X and CKj, <?i+i). We write, that V induces 
such a list of labels. The word(T') defined by such a path V is simply word(ai . . . a%). We also 
say that V is a path for a string word('P). A path is accepting, if it ends in an accepting state. A 
string w is accepted by N if there is an accepting path from the starting state for w. 

3 Basic classifications and outline of the algorithm 

In this section we present the outline of the algorithm for fully compressed membership for NFAs. 
Its main part consist of recompression, i.e. replacing strings appearing in word(X n ) by shorter 
ones. In some cases, such replacing is harder, in other easier. It should be intuitively clear, that it 
depends on the position of letters inside encoded texts: if a is a first or last letter of some word(Xj), 
then rccompressing strings including a looks difficult, as such strings can be split into different 
nonterminals and recompression requires heavy modification of G, or even rebuilding of the NFA. 
On the other hand, if a letter a is only 'inside' strings encoded by nonterminals, its compression 
is done only 'inside' rules of G, which seems easy. Thus, before we state the algorithm, we firstly 
introduce classification of letters (and strings) into 'easy' and 'difficult' to compress. 
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3.1 Crossing appearances, types of letters, non-extendible appearances 

We say that a string w has a crossing appearance in a (string defined by) nonterminal Xi with 
a production Xi — > uXjvXk, if w appears in word(Xi), but this appearance is not contained 
in neither u, v, word(X,) nor word(Afc). Intuitively, this appearance 'crosses' the symbols in of 
uword(Xj)v word(Xfc),i.e, at the same time part of w is in the explicit substring (u or v) and 
part is in the compressed strings (word(Xj) or word(Xk)). This notion is similarly defined for 
nonterminals with productions of the form Xi — > uXjv, productions of the form X, — >• u clearly 
do not have crossing appearances. 

A string w has a crossing appearance in the NFA N , if there is a path in N inducing list of 
labels aic*2, where a\,a 2 £ {X\, • • • ,X n } U E with at least one of a, a 2 being a nonterminal, 
such that w appears in a word(aa 2 ), but this appearance is not contained in the word(ai), nor in 
word(«2)- The intuition is similar as in the case of crossing appearance in a rule: it is possible that 
a string w is split between two transitions' labels. Still, there is nothing difficult in consecutive 
letter transitions, thus we treat such a case as a simple one. 

We say that a pair of different letters ah is a crossing pair, if ah has a crossing appearance of 
any kind. Otherwise, such a pair is non- crossing. 

We say that a letter a £ E is right-outer (left-outer), if there is a nonterminal Xi, such that a 
is the leftmost (rightmost, respectively) symbol in word(Xj). A letter is outer, if it is left-outer or 
right-outer. Otherwise, the letter is inner. Notice, that if a pair ah is crossing, then a is left-outer 
or b is right-outer. The outer letters and crossing pairs correspond to the intuitive notion of 'hard' 
to compress. 

The following lemma shows, that while G may encode long strings, they have relatively few 
different short substrings and few outer letters. 

Lemma 2. There are at most 2n different outer letters and at most \G\ + 3n different pairs of 
letters appearing in word(X„), . . . , word(A n ). 

The set of outer letters, the set of crossing pairs and the set of non-crossing pairs appearing 
in word(X„), . . . , word(X„) can be computed in polytime. 

The notions of (non-) crossing pairs do not apply to aa, still, an analog can be defined: for a 
letter a £ E we say that a 1 is a a's non-extendible appearance of length i, if it appears in some 
string defined by some nonterminal and it is surrounded by letters other than a, formally, if there 
exist two letters x,y £ E, where x ^ a ^ y and a nonterminal X i} such that xa l y is a substring 
of word(Ai). Similarly to crossing pairs, it can be shown that there are not too many different 
non-extendible appearances of a. 

Lemma 3. For an inner letter a and a grammar G there are at most \G\ different lengths of a's 
non-extendible appearances in word(Ai) 7 . . . , word(X n ). The set of these lengths can be calculated 
in polytime. 

3.2 Outline of the algorithm 

Our algorithm consists of two main operations performed on strings encoded by G 

appearance compression of a For each a 1 that has a non-extendible appearance in word(X„), 
replace all a e s in word(Ai), . . . , word(X„) by a fresh letter ag. Modify N accordingly. 

pair compression of ab For two different letters ab replace each of ab in word(Ai), . . . , word(X ra ) 
by a fresh letter c. Modify N accordingly. 

We denote the string obtained from w by compression of appearances of a by AC a (w) , and the 
string obtained by compression of a pair ab into c by PC a b^>c(w). 

We adopt the following notational convention throughout rest of the paper: whenever we refer 
to a letter at, it means that the last appearance compression was done for a and at is the letter 
that replaced a 1 . 
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The main idea behind the algorithm is that appearance compression and pair compression 
shorten the encoded texts significantly. The challenging part of the algorithm is the modification 
of NFA N 'accordingly' to the changes of SLPs. The general schema is given by Algorithm [T] 



Algorithm 1 Outline of the main algorithm 



while | word(X n ) > n\ do 

while something changed do 
for a: inner letter do 

compress appearances of a, modify N accordingly 
for non-crossing pair ab in word(X„), a, ^ {$, #} do 
compress ab, modify N accordingly 
L <r- list of outer letters, except $ and # 
for a 6 L do 

compress appearances of a, modify N accordingly 
for each agb in word(X n ) do 

compress agb, modify N accordingly 
Decompress X n and solve the problem naively. 



There are two important remarks to be made: 

• there is no explicit non-deterministic operation in the code, however, it appears implicitly 
in the term 'modify the NFA accordingly' in lines H] and Roughly, one needs to solve fully 
compressed membership for string c/ for that, and this is known to be NP-hard. 

• the compression (both of pairs and appearances) is never applied to S, nor to The markers 
were introduced so that we do not bother with strange behaviour where first or last letter is 
compressed, and so we do not touch the markers. 

It should be more or less obvious, that the compressions performed by Algorithm [1] shortens 
word(X n ). This is formally stated as follows. 

Lemma 4. There are 0(n) executions of the loop in line{J\ of Algorithm]^ 

Remark 1. Notice, that pair compression PC a b^b is in fact introducing a new nonterminal with 
a production c — > ab, similarly AC a . Hence, Algorithm [1] creates new SLPs, encoding strings 
from the instance. However, these new nonterminals are never expanded, they are always treated 
as individual symbols. Thus it is better to think of them as letters. In particular, the analysis 
of running time of Algorithm [T] relies on the fact, that no new nonterminals are introduced by 
Algorithm [1] 



4 Details 



In this section we describe in detail how to implement the appearance compression and pair 
compression. In particular, we are going to formulate the connections between NFA and SLPs 
preserved during Algorithm [T] and demonstrate how to modify the NFA accordingly. 



4.1 Invariants 

The invariants below describe the connection between the grammar kept by Algorithm [T] and the 
input one. 

SLP 1 The set of used nonterminals is a subset of X = {Xi, . . . , X n } and the productions are of 
the form described in {T]). 



G 



SLP 2 For every production Xi — > Q!j, the original instance contained a production — > o^, 
where the nonterminals appearing in a appear (in the same order) in a' . 

SLP 3 The nonterminal X n has a production X n — > $wA„_iv#, where u,v £ (S \ {$, #})*; S, # 
are not used in other productions. 

The following invariants represent the constraints on the NFA. 

Aut 1 every transition of N is labelled by a single letter of £ (letter transition) or by a nonterminal 
(nonterminal transition), each nonterminal labels at most one transition. No transition is 
labelled with X n . 

Aut 2 there is a unique starting state that has a unique outgoing transition, by letter $, and 
no incoming transitions; there is no other transition by $. Similarly, there is a unique 
accepting state that has a unique incoming transition, by letter it does not have any 
outgoing transitions; there is no other transition by # in N. 

Algorithm [T] will preserve (SLP ^)-(Aut and we shall always assume, that the input of the 
subroutines satisfies (SLP P)-(Aut |5J|. 

We assume that the input instance satisfies (SLP [T])-(Aut [5]), moreover, that that the input 
grammar is in the Chomsky normal form. It is routine to transform (in polytime) the input 
instances not satisfying these conditions into equivalents instance that satisfy them. 

4.2 Compression of non-crossing pairs and inner letters 

The compression of non-crossing pairs and appearance compression for inner letters is intuitively 
easy: whenever these appear in strings encoded by G or on paths in TV, they cannot be split 
between nonterminals or between transitions. Thus, it should be enough to replace their explicit 
appearances in the grammar and in the NFA. This is formalised and shown in this subsection 

4.2.1 Compression of non-crossing pairs 

We first demonstrate, how to perform the pair compression for non-crossing pairs. Consider a 
non-crossing pair ab. Since it is non-crossing, it can only appear in the the explicit strings in the 
rules of G. Hence, compressing ab into a fresh letter c consists simply of replacing each explicit 
ab by c in right-hand side of every production. Still, ab can appear on a path in N. But since 
ab is non-crossing, this can be either wholly inside a nonterminal transition (and so compression 
was already taken care of), or on two consecutive letter transitions. This is also easy to handle: 
whenever there is a path from p to q by a string ab, we introduce a new letter transition by c from 
p to q. This description is formalised in Algorithm [21 

To distinguish between the input and output G and N, we utilise the following convention: 
'unprimed' names refer to the input (like G, Xi, N), while 'primed' symbols refer to the output 
(like G, N'). This convention is used for lemmata concerning algorithms through the paper. 

Algorithm 2 Pair compression for a non-crossing pair ab 
h for each production Xi — > a do 
2: replace each explicit ab in a by c 
3: for states p, q do 

4: if there is a path for ab from p to q then 
5: put a transition Sn(p, c, q) 



We show, that Algorithm [2] indeed realises the pair compression for a non-crossing pair. 

Lemma 5. Algorithm^ runs in polytime and preserves (SLP^)-(Aut\^j). When applied to a 
non-crossing pair of letters ab, where a,b {$,#}, it implements the pair compression, i.e. 
word(X{) = PC a b~>c (word(Xi)), for each Xi. 

N' recognises word(A^) if and only if N recognises word(A„). If N is a DFA, so is N' . 



> Pair compression 
> Appropriate modifications of N 
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4.2.2 Inner letter appearance compression 

We can apply the same approach, as the one used for non-crossing pairs compression, to the 
inner letters appearance compression. However, in this case, the modification of N uses non- 
determinism. 

Since a is an inner letter, it cannot appear as the last or first letter of any nonterminal, and so 
every non-extendible appearance of a in word(Ai), . . . , word(A„) is an explicit substring in the 
right-hand sides of G's rules; so we simply replace explicit a by a fresh letter ag in each right-hand 
side of G's rules. Before considering the NFA, notice that as a is an inner letter, a 1 cannot have 
a crossing appearance in N, and no nonterminal defines a . Hence, when a is a substring of a 
string defined by a path in N, then af appears wholly inside a nonterminal transition, and this 
was already taken care of when considering G, or a labels a path using letter transitions only. So 
it is enough to check, whether there is a path for a 1 from p to q using only letter transitions. 



Algorithm 3 Appearance compression for an inner letter a 
1: establish the lengths t\, . . . , 1^ of a's non-extendible appearance 

2: for each a lm do > Appearance compression 

3: for each production Xi — > a do 

4: replace every explicit non-extendible a lm in in a by ag m 

5: for states p, q in N do > Appropriate modifications of N 

6: if 5]y(p,a em ,q) then > Verify non-deterministically, see Lemma [TJ 

7: put a transition Sn (jp, ai m , q) 



Lemma 6. Suppose that Algorithm^ is applied for inner letter a {$,=#=}. Then Algorithm^ 
properly implements non-extendible appearance compression, i.e. word(A 2 ') = AG a (word(Ai)) for 
each Xi and preserves (SLP^)-(Aut\^). 

The operations in line® of Algorithm^ can be performed in npolytime, other operations can be 
performed in polytime. 

Each of the new letters ag is inner. N recognises word(A„) if and only if N' recognises 
word(A^) for some non- deterministic choices. If N is DFA, so is N' . 

4.3 Compression of outer letters and crossing pairs 

Now, we turn our attention to the compression of outer letters and crossing pairs. The outline 
is as follows: we fix an outer letter a and modify the instance, so that a becomes inner. Then, 
Algorithm [3] is applied to a. Next, we want to compress each pair of the form a$b. Such a pair 
can be crossing, as b can be a left-outer letter. Thus, we modify the instance again, so that none 
of agb is a crossing pair so that it can be compressed using Algorithm [21 

4.3.1 Transforming an inner letter to an outer letter 

The reason, why a is an outer letter, is that it is the first or the last symbol in some word(Aj). 
To make it an inner letter, it is enough to remove each nonterminal's a-prefix and a-suffix. To be 
more precise: fix i and let word(Ai) = a li ua ri , where u does not start nor end with a. Then our 
goal is to modify G so that word(X-) = u. (If word(Ai) is a power of a, we simply give v! = e and 
Ti — 0.) This can be done in a bottom-up fashion, starting from X\\ it is enough to calculate and 
memorise the lengths of the a-prefixes and a-suffixes for consecutive nonterminals, see the loop in 
lines Q] U] of Algorithm 2J Then we need to modify the NFA accordingly: it is enough to replace 
the transition labelled with Xi by path consisting of three transitions, labelled with a li , X[ and 
a r '. 
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Algorithm 4 Changing an inner letter a to an outer letter. 

1: for i = 1 . . n do > Removing a-prefix and suffix 

2: let the production for Xi be Xi — > oti 

3: replace any nonterminal Xj in on by a : >Xja Vj , remove those defining e 

4: calculate the explicit a-prefix a £i and a-suffix a ri of on and remove them 

5: if there is a nonterminal transition Sn(j>, Xi, q) in N then > modification of NFA 

6: create new states pi, q\ in TV, remove transition 5n(j>, Xi, q) 

7: set transitions: 8 N (p,a e %pi), 8 N (pi,Xi,qi), 5 N (qi,a r %q) 



The removed a-prefixes and a-suffixes can be exponentially long, and so we store them in the 
rules in a succinct way, i.e. a e is represented as (a,£); the size of representation of I is O(log^), 
that is, linear in n. We say, that such a grammar is in an a-succinct form. The situation is similar 
for the NFA, as it might have transitions labelled with a , which are stored in succinct way as 
well. We say that N satisfies a-relaxed (Aut^), if its transitions are labelled by nonterminals, a 
single letter or by a e , where I < 2™. 

Lemma 7. Algorithm^ applied to letter a ^ {$,#} runs in polytime time and preserves (SLP\T$- 
(Aut\^j), except that it a-relaxes (Aut^). G' is in the a-succinct form. 

Let word(Ai) = a li Uia ri , where Ui does not begin, nor end with a. Then (after running 
Algorithm^ , word(A 2 ') = itj. In particular, after running Algorithm^ the letter a is inner. 

N accepts word(A„) if and only if N' accepts word(A^). If N is a DFA, so is N' . 

Since a is no longer an outer letter, we may compress its non-extendible appearances using 
Algorithm [3J Some small twitches are needed to accommodate the a-succinct form of G and the 
fact that N is a-relaxed, though basically we just perform appearances compression for inner a, 
as described by Algorithm [3] the non-trivial part of Algorithm [3] was the application of Lemma [TJ 
which works for such large powers of a in n polytime. Other actions of Algorithm [3] generalise in a 
simple way. 

Lemma 8. Algorithm^ can be extended, so that it applies to instances satisfying (SLP^)-(SLP\3\) 
with G in the a-succinct form and a-relaxed- (Aut^)~(Aut\^j . Lemma\^ applies to such an exten- 
sion. The output satisfies (SLP\$-(Aut\3j). 

4.3.2 Crossing pair compression 

By Lemma \E\ all letters ai are inner. However, a pair of the form agb can still be crossing and 
this can happen only when 6 is a left-outer letter, so we would like to make such b not a left-outer 
letter. To do so, we 'pop' one letter from the beginning of each nonterminal (that is, all left 
outer letters), i.e. we modify the grammar so that word(Ai) = first (Xj) word(A l '), where first(Aj) 
denotes the first letter of word(Xj). Clearly, after such operations there are some (perhaps other) 
left-outer letters in G. Still, we show, that none aib is crossing. 

Popping letters is performed similarly to the removal of the a-prefix, i.e. in a bottom-up 
fashion, starting from X\. in a rule Xi — ¥ a it is enough to replace each nonterminal Xj in a 
by first(Xj)Xj, then store the first letter from a in first(Aj) and remove it from a. It is easy to 
modify the NFA N accordingly: when there is a transition 5n{p, Xi,q), we change it into a chain 
of two transitions: 5jsri(p,Siat(Xi),pi) and Sn'Ipi, X^,q). This operation is not performed on X n , 
as the letter $ is not going to be compressed anyway. There is a little detail to take care of: if 
|word(Xj)| = 1 then popping a letter from Xi creates X[, which defines e. Then X[ should be 
removed from the right-hand sides of the rules and in the NFA N we simply replace the transition 
by Xi by a transition by word(Aj). This description is formalised in Algorithm 
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Algorithm 5 Popping letters 
1: for i <— 1 . . n — 1 do > Popping letters 



2: let Xi -» a 

3: replace each Xj in a by first (Xj)Xj, remove those defining e 

4: set first (Xj) <— the first letter of word (X,) 

5: if i < n then 

6: remove the first letter from a 

7: if there is a transition 5n{p, Xj, q) in X then > NFA modification 

8: remove transition 5^(p, Xi, q) 

9: if |word(Xj)| > 1 then 

10: create new state p\ in X, set transitions: 5j\r(p, first (Xj),pi), 8n(pi, Xj, q) 

11: else 

12: set transition <5/v(p, first(Xj), g) 



Lemma 9. Algorithm\^runs in time polytime and preserves (SLP^)-(Aut\®\). Leiword(Xj) = bu, 
where b £ E, i/ien word(X|) = it /or i < n and word(X^) = word(X„). After running Algorithm^ 
pairs of the form aib appearing in word(X„) are non-crossing. 

N' accepts word(X^) if and only if N accepts word(X„ ). //X is deterministic, so is N' . 

Now, it is enough to apply the pair compression for non-crossing pairs to each pair of the 
form agb. For convenience, we write the whole procedure for pair compression for crossing pairs 
in Algorithm [SJ 



Algorithm 6 Pair compression for crossing pairs agb 
1: pop the first letter from each nonterminal (run Algorithm [5| 
2: for each ai do 

3: for each b such that agb appears in word(X„) do 
4: run Algorithm |2] for a pair agb 



Lemma 10. Algorithm runs in polytime and preserves (SLP^-(Aut\^). It implements pair 
compression for ab, in the sense that word(Xj') = PC a h_i. c (word(Xj)) for each Xi. 

N' accepts word(X^) if and only if N accepts word(X„). //X is deterministic, so is X'. 

4.4 Running time 

Since the running time of each algorithm is npolytime, it is enough to show that the size of E, G 
and X are always polynomial in n (recall, that n is unchanged throughout Algorithm [T]) . 

Lemma 11. During Algorithm^ the sizes o/S, G, X are polynomial in n. 

Using Lemmas I5HTT1 it is now possible to conclude that Algorithm [1] correctly solves the fully 
compressed membership problem for NFA, in nondeterministic polynomial (in n) time. The only 
source of non-determinism is the one in Lemma [TJ and so for DFA the corresponding problem can 
be solved deterministically. 
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Appendix 



A Additional material for Section [T] 

(see page [3]) 

proof of Theorem^ The proof follows by showing that Algorithm [T] properly verifies, whether 
word(A„) is accepted by N and that Algorithm Q] runs in npolytime. 

Let us first show correctness of Algorithm[T] All subroutines of Algorithm[T](non-deterministically) 
modify the instance, changing G, N and X n into G' , N' and X' n (notice, that the output depends 
on the non-deterministic choices). Let N^ l \ G^\ X„ for i = 1, . . . , k be the consecutive ob- 
tained instances, with i = 1 representing the input instance. Then accepts word(X„)W if 
and only if for some non-deterministic choices the resulting N^' l+1 ^ accepts word(A„)( l+1 ) . This is 
shown in Lemmata O [H El El EH (if some of the procedures are deterministic, then the output 
does not depend on any choices). So, if does not accept word(X^) also does not 

accept woTd(x[ k ^). On the other hand, if accepts word(xi 1 ' > ), then there exits a sequence 
of instances (representing proper non-deterministic guesses), such that for each i accepts 
word^Xn^). In particular, does accept word(JCn ) and as | word(x[ fc ^)| < n, word(Xi ) can 
be decompressed and acceptance by can be checked naively in polytime. 

Now, we should show that the running time is in fact (non-deterministic) polynomial. Lem- 
mata 03 El [7J [HI HI HOI claim, that each of the subroutine runs in time npolytime in the size of the 
current instance. However, by Lemma I 111 the size of this instance is polynomial in n. So, it is 
left to show that each of the subroutine is run only polynomially many times. Notice, that each 
invocation of Algorithms [21 [21 El introduces a new letter to E, and we know by Lemma [TT] that 
the final size of E is poly(n). Moreover, each invocation of Algorithm [H if followed by Algorithm^ 
(in the extended version), similarly, each Algorithm [S] can be associated with Algorithm [5] invoked 
after it. And so also Algorithms 01 [5] are invoked poly(n) times. So the whole running time is 
npolytime. 

It is left to show, that if the input is a DFA, Algorithm [T] can be determinised. Firstly, notice 
that by Lemmata [5] [HI O [HI El EH if the instance consisted of a DFA, each instance kept by 
Algorithm [T] is also a DFA. Observe, that the only non-deterministic choices in Algorithm [1] are 
performed when calling a subroutine for a fully compressed membership problem for a string over 
an alphabet consisting of a single letter (see Lemma [TJ. However, the same lemma states, that 
when the input consists of a deterministic automaton, the problem is in P. Thus, there is no 
non-determinism in Algorithm [T] □ 

B Additional material for Section [2] 

It is more convenient to represent path's list of labels as 

Xi 2 Xi A • • • Xi n l Ui n: 

where each ut j G E* is a string representing the consecutive letter labels and X^ represents a 
nonterminal label. Notice, that Ui j may be empty. We write, that V induces such a list of labels. 
The word('P) defined by such a path V is 

word('P) = itjj word(Ai 2 )ui 3 word(Ai 4 ) • • • word(-X"j n _ 1 )uj tl . 

(see page [JJ 

Proof of Lemma\T\ Notice, that if the input string w is over an alphabet {a}, no accepting path 
in NFA for w may use transitions that denote strings having letters other than a. Thus, any 
such transitions can be deleted from TV and we end up with an instance, to which the result of 
Plandowski and Rytter [T51 Theorem 5] can be applied directly. 

Consider now the deterministic automaton. As shown in the previous paragraph, we can limit 
ourselves to transitions by powers of a. Since the automaton is deterministic, for each state there 
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is at most one transition labelled with a power of a, and so the path for w cycles after at most 
n transitions. As the length of the cycle can be calculated in polytime, the whole problem can be 
easily checked in polytime. □ 



C Additional material for Section [3] 

(see page [5]) 

Proof of Lemma Since there are n nonterminals, there are at most n left-outer letters and n 
right-outer letters. Clearly, they can be calculated in polytime. 

We estimate the total number of pairs of letters that appear in word(Xt), . . . , word(X„). Con- 
sider first such pairs that appear in some explicit string on the right-hand side of some production. 
Since the total length of explicit strings is |G|, there are at most \G\ such pairs. Other pairs are 
assigned to nonterminals X\, . . . , X n : a pair ab is assigned to Xi, if it appears in word(Xi) and 
it does not appear in word(Xi), . . . , word(JQ_i). We show, that at most three different pairs 
are assigned to each nonterminal. In this way, total number of different crossing-pairs is at most 
\G\ +3n. Indeed, if ab is assigned to Xi with a production X, —> uXjvXk, then ab does not appear 
neither in u, v, as in such case it was already accounted; nor in word(Xj), word(Xfc), as in such 
case it is not assigned to Xi. Thus, there are three possibilities: 

• a is the last letter of u and b is the first letter of word(Xj), 

• a is the last letter of word(Xj) and b is the first letter of v, 

• a is the last letter of v and b is the first letter of word(JTfc). 

The cases, in which u or v is empty or there are less nonterminals on the right-hand side of the 
production, are similar. 

The above description can be turned to a straightforward algorithm computing both the list 
of all non-crossing and crossing pairs appearing in word(Xi), . . . , word(X„). First, the list of all 
pairs of letters with such appearances is calculated: clearly, it is enough to read every rule (for 
Xi) and store the pairs that appear in the explicit strings and the pairs that are assigned to Xi- 
Then, for each pair of letters it should be decided, whether it is crossing. To this end, we check, 
whether it has a crossing appearance in any nonterminal or in N, which can be done in polytime. 
Such pairs are crossing, other are non-crossing. □ 

Instead of Lemma [3] we show an extended version 

Lemma 12 (stronger variant of Lemma [3]). For an inner letter a and a grammar G, which 
can be given in an a- succinct form, there are at most \G\ different lengths of a's non- extendible 
appearances in word(Xi), . . . word(X„). The set of these lengths can be calculated in polytime. 

Proof. The proof is similar to the proof of Lemma 

Since a is an inner letter, all of its non-extendible appearances are explicit substrings in pro- 
ductions of G. In particular, each symbol in the productions can uniquely assigned to the non- 
extendible appearance to which it belongs; this is true regardless of whether the symbol represents 
letter or block of letters written in a succinct way. Thus, there are at most \G\ different non- 
extendible appearances for a. To calculate the lengths of these appearances, it is enough to read 
the explicit strings in the rules, adding the appropriate lengths, which can be done in polytime, as 
these lengths are at most 2". □ 

Proof of Lemma^ Consider any 2 consecutive letters ab, where a/} and b =/= fj=, appearing in 
the word(X„) at the beginning of loop starting in lined] We show, that at least one of these two 
letters is compressed before the next execution of this loop. In this way, if we partition word(X„) 
into blocks of 4 consecutive letters, each block is shortened by at least one letter in each iteration 
of the loop from linc[T] Thus the length of word(X ra ) decreases by a factor of 3/4 in each iteration 
and so this loop is executed at most 0(n) times, as in the input instance | word(X„) < 2 n . 



(see page [5]) 



(see page [5]) 
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Assume for the sake of contradiction, that none of letters a, b is compressed during this iteration 
of the loop. 

If a = b, then this pair of consecutive letters is going to be compressed, either in line U or 01 
depending on whether a is inner or outer. Contradiction. 

So suppose now that a ^ b. Since the pair ab was not compressed before line [7] in Algorithm [I] 
it is crossing in this line and thus at least one of the letters a, b is outer. This outer letter is going 
to be compressed with the next letter in line[TT] (or earlier). Contradiction. 

This ends the case inspection. 

Since in the input instance it holds that P < 2™, this guarantees, that the loop in line Q] of 
Algorithm [T] is run 0(n) times. □ 



D Additional material for Section [H 

(see page [7]) 

Proof of Lemma\^ The bound on the running time is obvious from the code. 

Since Algorithm [2] only modifies the grammar by shortening some strings in the productions 
(it does not create e-rules), and it does not affect $ and # in the rules, the (SLP P)~(SLP [3]) are 
preserved. The only modification in N is the introduction of new transition by a single letter 
(namely, by c) between states that are joined by a path for ab. The only change in N is the 
introduction of a new letter transitions. Moreover, if there is a new transition 5n'(p' , c, q'), then p 
has an outgoing production by a ^ {$, Jf), and so it was not a starting or accepting state, and q 
had an incoming transition by b £ {$, and similarly it was not a starting nor accepting state. 
Thus (Aut P)-(Aut [2]) hold for N' as well. Notice, that if N is deterministic, so is N': suppose 
that there are two different transitions by a letter d from state p in N' . If d ^ c, then these two 
transition are also present in N, which is not possible, as N is deterministic. If d = c, then in N 
there arc two different paths from p for a string ab, which is also a contradiction. 

We now show that TV' recognises word(A^) if and only if N recognises word(A„). To this end 
we demonstrate, how Algorithm [2] affects word(X;): 

Claim 1. After performing Algorithm |2J it holds that 

word(A 4 ') = PC ab ^ c (word(X t )). (2) 

Proof. Notice, that as a ^ b, PC a b^c is well defined for each string. 

The claim follows by a simple induction on the nonterminal's number: Indeed, this is true 
when the production for Xi has no nonterminal on the right-hand side (recall the assumption that 
a ^= b) , as in this case the pair compression on right hand side of the production for Xi is explicitly 
performed. When Xi — > uXjvXk, then 

word(Ai) = uword(Xj)u word(Xfe) and 

word(A 4 ') = PC a6 ^ c (u)word(Xj)PC a6 ^ c (w)word(X^ 

= PC ab ^ c (u)PC ab ^ c (word( X'j ))PC ab ^ c (v) PC ab ^ c (word{X' k )), 

with the last equality following by the induction assumption. Notice, that since ab is a non-crossing 
pair, all occurrences of ab in word(Ai) are contained in u, v, word(Aj) or word(Xfc), as otherwise 
ab would be a crossing pair, which contradicts the assumption. Thus, 

PC ab ^ c (word(X i ))=PC , a6 ^ c (u)PC' o6 ^ c (word(X;.))PCa6^c(u)PC' a ^ c (word(^)), 

which shows that PC a (,_!. c (word(Ai)) = word(Aj'), ending the proof of the claim. □ 

The second claim similarly establishes, how the pair compression of a non-crossing pair affects 
the NFA. To be more precise, what happens to a string defined by a path in the NFA after applying 
pair compression to the underlying NFA. 
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Claim 2. Consider a non-crossing pair ab and a path V in NFA TV, which defines a list of labels: 

where each itj. S X* is a string representing the consecutive letter labels and Xi j represents a 
transition by a nonterminal transition. Then 

PC ab ^ c (word(V)) = PC ab ^ c ( Uil ) woxd(X' i2 )PC ab ^ c (u i3 ) word(X^) • • • word(X| n _ 1 )PC a6 ^ c (u i „ 

Proof. Similarly as in Claim [1] notice, that as ab is a non-crossing pair, the appearance of ab in 
the string defined by V cannot be split between a nonterminal and a string (other nonterminal). 
Thus, replacement of pairs ab takes place either wholly inside string u or inside word(A",;). The 
former is done explicitly by PC ab ^ c , while (0) establishes the form of the latter. This ends claim's 
proof. □ 

Now, after proving Claims HHB it is easy to show the main thesis of the lemma, i.e. that 
word(X J ' l ) is accepted by TV' if and only if word(X ra ) is accepted by TV. 

© Suppose first that word(X n ) is accepted by TV. Consider the accepting path V for word(X„), 
represent it as a list of labels: u^X^u^X^ ■ ■ ■ Xi n _ 1 Ui n , similarly as in Claim[2J Of course, 

word(T') = word(X n ) = word(X l - 2 )u l - 3 word(A 44 ) • • • word(Xi n _ 1 )uj n . (4) 

We will construct an accepting path V 1 in TV' inducing a list of labels 

PC ab ^ c (u n )X' l2 PC ab ^ c (u l3 )X' u ■ ■■X' in _ i PC ab ^ c {u in ). (5) 

Using and recalling that word("P) = word(A„) will be enough to conclude that word (7-*') = 
PC '^(wordpQ). 

Notice, that by Algorithm [2] 

• if there is a transition 5]y(j>,d,q) for a letter d € £ in N, then there is the same transition 
6 N >(p,d,q) in N'. 

• if there is a path from p to q for a string ab in N then there is a transition 8n> (p, c, q) in N' . 

Thus, by a trivial induction on the length of the string u, if Si\r(p, u, q) then also 5n'(p, PC ab ^, c (u), q). 
The situation is similar for nonterminals: if there is a transition 8n(p, Xi, q) in N, then there is an 
analogous transition Sn 1 {p, X-, q) in TV'. Thus, a path V with the same starting and ending state 
as V and the list of labels as in © is inductively defined. Since the starting (accepting) state in 
N and N' coincide, this shows that word('P') is accepted by N' . 

© Suppose now that a string PC a b_ s . c (word(A'4)) is recognised by N' . Let the path of the 
accepting computation in TV' be V' , with a list of labels 

u^X[ 2 u' i3 Xl---X' in _ i u' in . 

Similarly to the previous case, we will inductively define an accepting path V in TV with a list of 
labels 

Ui^Xi 2 u i3 Xi A ■■■X in _ 1 u in , (6) 

where Ui . is obtained from u! i . by replacing each c by ab. 
Notice, that by Algorithm [21 

• if there is a letter transition Sn'(p, c, q) in TV', there is a path from p to q for a string ab in 
TV. 

• if the a letter transition 5^' {p, d, q) for a letter d ^ c, there there is the same transition 
5 N (p,d,q) in TV. 
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Thus, by a simple induction, we conclude that if there is path in N' from p to q for a string u[ . , 
then there is a path in N from p to q for a string Ui j . Now observe, that if there is a transition 
8n>(p, Xi,q) in N', then there is an analogous transition 8n(p, Xi, q) in N. This completes the 
construction of V. Since the starting and accepting states in N and N' coincide, the constructed 
path V is also accepting, furthermore, Vs list of labels is as in ©. 

It is left to show, that word('P) = word(A„). Since PC a b^,c is a one-to-one function, it is 
enough to show that PC a (,_ > . c (word(7 :> )) = PC a t,-y c (woid(X n )). Notice, that the latter equals 
wordpf;), by ©. 

Using ([3]) we can conclude that 

FC ah ^ c (word(P)) = PC ab ^ c (u h ) wovd(X' i2 )PC ab ^ c (u i3 ) word(X^) • • • word(X in _ 1 ) 
= < word(X( 2 )< word(X^) • • • wordpT^X, 
= wordfP') 
= word(X), 

which concludes the proof. □ 

(see page[S]) 

Proof of Lemma® stronger version. We shall proof Lemma[S]in a little stronger version: we allow 
the grammar to be in an a-succinct form and NFA N is assumed to satisfy the a-relaxed-(AutP), 
see appropriate definition between Algorithm U] and Lemma UJ This weaker assumption will allow 
proving Lemma [5] here. 

Notice, that by Lemma Q21 there are only \G\ + 'in possible lengths of a's non-extendible 
appearance and that they can be calculated in polytime; hence line[T]takes polytime. All the loops 
in Algorithm [3] have only polynomially many iterations. All operations listed in Algorithm [3J are 
elementary and can be clearly performed in polytime, except replacing each a 1 by ai in a rule in 
line U and for the verification in lineal for the former operation, it is enough to read the rules of 
the grammar: recall, that a is represented as a pair (a,£). Since £ < 2™, addition of the lengths 
can be performed in polytime. The verification is more involved, we outline how to perform it: For 
given two states p, q and a string a we want to verify, whether there is a path from p to q for a 
string a 1 (notice, that since a is inner, none of word(Ai) is a power of a). This can be rephrased 
as a fully compressed membership problem for NFA over a unary alphabet: it is enough to 

• restrict NFA N to transitions by powers of a, 

• make p the unique starting state, 

• make q the unique accepting state. 

Notice, that we can restrict ourselves by transitions by powers of a: since a is an inner letter, no 
word(Ai) begins or ends with a. Hence, when a path in the NFA defines a , then this path uses 
only transitions by powers of a. 

Observe, that all considered powers a 1 appear in strings defined by G, and so £ < 2" ., So each 
such a e can be represented by an SLP of polynomial size. In particular, Lemma Q] is applicable 
here and so the verification is in npolytime. 

Concerning the preservation of invariants: we first show, that the G' is not in the a-succinct 
form, nor N' is a-relaxed. When Algorithm [3] finishes its work, all non-extendible appearances of 
a are replaced, in particular, there are no succinct representations of a powers inside the grammar. 
This should apply to transitions labelled with powers of a in N, and so the last line of Algorithm[3J 
should be modified, so that all transitions by powers of a are removed. 

Now we can return to showing the preservation of the invariants: since the only change to 
the productions consists of replacing non-extendible occurrences of a by a single letter, (SLP 
(SLP[3J are preserved. Also, the only modifications to the NFA is the addition of new letter 
transitions. Thus, (Aut^) holds. To see that also (Aut[2| holds, notice, that if p receives new 
incoming (outgoing) transition in N' , this transition is of the form ai and p had an incoming 
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(outgoing, respectively) transition by a* in N. In particular, the starting and accepting state 
remain unaffected and no transition by $ and # are introduced. Thus, also (Aut[2J) holds for N'. 

Notice, that if N is deterministic, so is N': suppose that there are two different transitions by 
d in N' . If d is not one of the new letters, i.e. it is a nonterminal or an old letter, then the same 
transitions were present in N , contradiction. If there are two transitions by a new letter ai from 
the same state p, then in N there are two different paths for a from state p, contradiction. 

Claim 3. After performing Algorithm [31 it holds that 

word(Xi) = AC Q (word(X l )). (7) 
Claim 4. Consider an inner letter a and a path V in N, which has a list of labels: 

where each ttj. € S* is a string representing the consecutive letter labels and Xi j represents a 
nonterminal transition, similarly as in Claim [21 Then 

AC a (-word(V)) = AC a (u n ) word{X' l2 ) AC a (u l3 ) word(A 4 ' 4 ) • • • word(X^_ 1 )AC a (tt iB ). (8) 

The proofs are analogous as in Lemma and are thus omitted. Notice, that the properties 
stated in Claims [3H3] do not depend on the non-deterministic choices of Algorithm [3J 

It is left to show the main claims of the lemma: N recognises word(A„) if and only if NFA N' 
recognises word(A^) for some non-deterministic choices. 

© Suppose first that the N' accepts word(A^), using the path V 1 . Clearly word(7 ,/ ) = 
word(A^). Let the list of labels on V be 

Let u'i . be obtained from itj . be replacing each ai m with a tm . We shall construct a path V in N, 
which has the same starting and ending as V 1 and induces a list of labels 

Notice, that 

• if there is a transition Sn> (p, A 2 ', q) in N' then there is a transition 5]\r(p, Xi, q) in N 

• if there is a transition Sn 1 (p, b, q) for b a in iV', then there is a transition Sn(p, b, q) in TV 

• if there is a transition 5n{p, CLg ml q) in N' for some a lm that has a non-extendiblc in one of 
word(Ai), . . . , X n , then there is a path from p to q for a string a tm in AT. 

Therefore, by an easy induction, V is a valid path in A, moreover, since V 1 is accepting, so is V . It 
is left to demonstrate, that V defines word(A„): since AC a is a one-to-one function, it is enough to 
show that AC a (woid(P)) = AC a (word(A„)). By it holds that AC a (word(X n )) = word(^). 
The value of AC a (word("P)) is already known from (J5J), and so it is enough to show that 

ACaM word(X' l2 ) AC a (u l3 ) word(X^) • • • word(A^_ i )AC a (u l J = wordpO. 

But this is simply the fact, that path V' defines word(A^), which hold by the assumption. 

© Suppose now that N accepts word(A„). Consider the case in which Algorithm [3J always 
made a correct non-deterministic choice, i.e. that each time it correctly verified in lineO 

Let the accepting path V in N has a list of labels 

^z'l X-22 Ui 3 X^ A • • ■ Xi n _ 1 Uj n ) 

where, similarly as in Claim 01 each u%. is a string representing the consecutive letter labels (ui i 
may be empty) and X^ represents a transition by a nonterminal transition. By the definition, 

word(A„) = u h woid(Xi 2 )ui 3 word(A 44 ) • • • word(Xi n _ 1 )u i „. 
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Now consider the a's appearance compression applied to both side of this equality. By ([7]) and ([5]) 

wordpQ = AC a ( Uil )woTd{Xl 2 )AC a (u i3 )vrord[Xl 4 ) ■ ■ ■ word(X I _ J AC a (u in ). 

We will construct an accepting path V' with a list of labels 

ACaMX'^ACafajX^ ■ ■■X[ n i AC a {u ln ). 

Notice, that word('P') = word(A^), and so construction of such path V 1 will conclude the proof. 
We iteratively transform V into V . Notice, that 

• if there is a transition 8n(p, Xi, q) in N then there is a transition <5/y(p, X[, q) in N' 

• if there is a transition 6n(p, b, q) for b a in N, then there is a transition S^'ip, b, q) in N' 

• if there is a path in N from p to q for string a that has non-extendible appearance in some 
word(Xi), then there is a transition 8n>(j>, a>e, q) in N' (by the assumption that Algorithm [3] 
guessed correctly) 

And by an easy induction V 1 is valid path in N' and has the same starting and ending state as V 
. Since V is accepting in N and the starting and accepting states in N and N' coincide, V' is an 
accepting path in N'. □ 

(see page [5]) 

Proof of Lemma\^ We hrst explain, how to calculate the a-prefix (a-sufhx); since G is in a-succinct 
form, this might be non-obvious. It is enough to scan the explicit strings stored in the productions' 
right-hand sides, summing the lengths of the consecutive a's appearances. This clearly works in 
polytime also for G stored in an a-succinct form, as the powers of a may have length at most 2™, 
and so the length of their representation is linear in n. (the correctness of this approach is shown 
later). 

Algorithm 2] runs in polytime, as the first loop has n iterations and the second \N\ iterations 
and each line can be performed in polytime. 

Concerning the preservation of the invariants: in each rule of the grammar at most 4 non- 
cxtendible appearances of a are introduced. They may be long, however, in compressed form we 
treat them as singular symbols. In this way rules of G are stored in an a-succinct form, which 
was explicitly allowed. Then the a-prefix and suffix are removed and nonterminals defining e 
are removed from the right-hand sides of the productions. This does not affect the (SLP [TJ- 
(SLP[H1) (recall, that a is not $, neither #). Since the NFA is also changed, we inspect the 
invariants regarding N: introducing new states p±, q\ and replacing transition 5n(p, Xi, q) by 
three transitions 5jv'(p>ct ,pi), <$iv'(pi, X, qi), Sn' (qi , oT l , q) preserves the (Aut ^)-(Aut [2|, with 
the exception that it a-relaxes (Aut^). 

Notice, that if N is deterministic, so is N': as already mentioned the only change done to N is 
the replacement of transition by Xi by a path of three transitions, such the the two states in the 
middle have exactly one incoming and outgoing transition, which clearly preserves determinism of 
the automaton. 

To show the correctness, we prove by induction on i the two main claims of the lemma: 

• Algorithm 2] correctly calculates the length of the a-prefix and a-suffix of word(Ai), i.e. li 
and Ti, 

• word(Ai) = a li word(A l ')a ri . 

As simple corollary, observe that these two conditions imply that a is not the last, nor the first 
letter of word(X ( '). 

For i = 1 notice, that the whole production for X\ is stored explicitly, and so Algorithm H] 
correctly calculates the a-prefix and suffix of word(Ai) and after their removal, word(Ai) = 
a ei word(A()a ri . 
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For the induction step, let Xi — > uXjvXk- Then by the induction assumption: 

word(JQ) = uwoid(Xj)vwoTd(Xk) 

= ua t] word(X' j )a T: > va lk word(X' k )a rk . 

There are cases to consider, depending on whether word(Aj) = e or not and whether word(A^) = e 
or not. We will describe the one with word(Aj) ^ e and word(X' k ) = e, other cases are treated 
in a similar way. Then the rule is rewritten as ua ij word(Xj)a rj va ik a Tk . Since by the inductive 
assumption, a is not the first, nor the last letter of word(Xj), the a prefix (a-suffix) of word(Aj) 
is the a-prefix (a-suffix, respectively) of ua j (of a rj va lk a rk , respectively). Thus it is correctly 
calculated by Algorithm 0J As the last action, Algorithm Q] removes the a-prefix and a-suffix, 
which shows that a word(X t -)a r< = word(A^). 

It is left to show that N accepts word(A n ) if and only if N' accepts word(A^). To this 
end, notice, that the only modification to N is the replacement of the transition of the form 
Sn(p, Xi,q) by a path labelled with a Ti , X^a * . Furthermore, the two vertices inside this path 
have only one incoming and one outgoing transition. The path labelled with a ri ,X^, a 4 defines 
the string a Ti word(A l ')a % which was already shown to be word(A^). It is left to observe, that 
none of the newly introduced states in the middle of the path is accepting, nor starting. Hence 
the starting (accepting) states of N and N' coincide, each string is accepted by N if and only if 
it is accepted by N'. □ 

(see pageO 

Proof of Lemma\E[ Notice, that the proof of Lemma E| was shown in a stronger version, which 
coincides with the statement of Lemma [8] □ 

(see 

Proof of Lemma\Q Majority of the proof is similar to the proof of Lemma [7J Nevertheless, it is page [TO"]) 
written for completeness. 

The loop is executed polynomially many times, and also each line of the code can be performed 
in polytime, and so in total Algorithm [5] runs in polytime. 

Concerning the preservation of invariants: since the only operation performed on G is replacing 
nonterminal Xi by aX[ and then deleting the first letter of the nonterminal, and nonterminals 
generating e are explicitly removed from the rules, the resulting grammar is in the form (fTJ . Notice, 
that the invariants (SLP P)-(SLP [3]) are clearly preserved by the listed operations. Since the first 
and last symbol of word(A„) is not modified, (SLP [3]) holds as well. 

Let us move to the NFA invariants: the only change applied to NFA is the replacement of the 
transitions 5 N (p,Xi,q) to by a path Sff>(p,Giat(Xi),pi), 8 N >(pi,Xi,q), or by 8 N >(p,Gxst(Xi),q), 
which does not affect (Aut P)-(Aut For the same reason, if N is deterministic, so is N': 

We first show by induction the second claim of the lemma, i.e. that if word(A^) = bu for some 
b € E, u G £*, then word(Aj') = u. For the induction basis consider i — 1: let the rule for Xi 
be X\ — > bu for some b 6 S and u G X*. Then x\ — b and u' = u and the production for X[ is 
X[ — y v! . Clearly the claim holds in this case. 

For the inductive step, consider i > 1 and let JQ's production be Xi — !> uXjvX).. Then, 
word(Ai) = u word (Aj)u word (Xfc) and by by induction assumption: 

uwoid(Xj)v word(-Xfc) = ufirst(Xj) word(Xj)t>first(Xfc) woid(X^). 

Since ufirst(Aj) is non-empty, first(Xj) can be computed in line 0] of Algorithm We conclude 
that 

word(A,) = u firsts )word(Xj)« first (Jf*) word(X^) 

= firstly first(Aj) word(Aj)i> first(A fe ) word(A^) 
= first(A i )word(A l / ). 

The analysis for other forms of the rule, i.e. Xi — > uXjv or Xi u, is similar. 
We show now that N' accepts exactly the same strings as N. The only change done in the 
NFA is the replacement of transitions of the form 8n{p, Xi, q) by a path inducing list of labels with 
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first(Xi) and X[ or by a transition 5n(jj, first (Xj), q), let us consider the former case, the latter is 
similar. Notice, that word(Xi) = first(JQ) word(A,-) and so the new path denotes the same string, 
as the replaced transition. Furthermore, the newly introduced state in the middle of this path has 
only one ingoing and outgoing transition. The starting and accepting states were not modified, 
and so the both automata recognise the same strings. Since word(A„) = word(X^), this ends the 
proof. 

It is left to show, that none of the pairs agb is crossing. Assume for the sake of contradiction, 
that some of such pairs agb is. The analysis splits between cases, why the pair agb is crossing. 

agb has a crossing appearance in the NFA N' In this case there is a state p and a pair of 
incoming and outgoing transitions, such that ag is the last letter of the string encoded on 
the incoming transition and b is the first letter on the outgoing string encoded transition. 
Moreover, at least one of this transitions is labelled with a nonterminal, we distinguish these 
two subcases: 

transition into p is labelled with X[ In such a case ag is the last letter of word(X l / ), 
and since word(Xj) = first(Xj) word(Xj'), also the last letter of word(Aj). Thus ag is a 
right-outer letter in G, contradiction with Lemma [S] 

transition into p is labelled with a letter and transition from p is labelled with X[ 

In this case ag is the label of the transition incoming to p. Observe, that due to the al- 
gorithm, if X[ labels the transmission from p, there is a unique transition to p, labelled 
with nrst(Xj). This means that ag is the first letter of word(X;), which is not possible 
by Lemma [5] 

agb has a crossing appearance in nonterminal X[ Let the production for X[ be X[ — > uX'jVX' k , 
the cases of other productions are similar. There are three possibilities for a crossing ap- 
pearance of agb in word(A I '): 

ag is the last letter in u and b is the first letter in word(Aj) Algorithm[5]replaced Xj 
with 6xst(X j)Xj in the rule for X[. Thus, since ag is the letter preceding Xj in the rule 
for X!-, it holds that ag was the first letter in word(Aj), which is a contradiction with 
Lemma [S] 

ag is the last letter in word(X') and b is the first letter in v Since the last letter of 
word(X|) and word(Ai) are the same, as word(Xi) = first(Aj) word(A-), ag is the last 
letter of word(Xi), which is a contradiction with Lemma [SJ 

ag is the last letter in v and b is the first letter in word(X£) the proof is the same as 
in the first case. □ 

(see 

Proof of Lemma\l(A Notice first, that there are at most \G\ different letters af. they were intro- page fTU)) 
duced by Algorithm [3] as representations of non-extendible appearances of a, and by Lemma [T2] 
there are at most \G\ different such appearances. Other operations of Algorithm constitute of 
call to Algorithm [5] and Algorithm^ and thus the running time and preservation of the invariants 
follows properties of these two algorithms, stated in Lemma [5] and Lemma □ 

(see 

Proof of Lemma [771 We first bound the size of E. We show, that at the beginning of each iteration page fTH)) 
of the main loop in Algorithm [1] each right-hand side of the production has at most 48n explicit 
letters, and that inside each iteration of the main loop of Algorith Q] there are at most Yin new 
letters added (we exclude the letter replacing compressed strings). Notice, that during the run of 
Algorithm [1] grammar G may be in succinct form, and accordingly we treat a 1 as one symbol. 

Clearly, these bounds are true when Algorithm [T] starts working. Let us fix a rule and consider, 
how many new letters may be introduced in this rule. There are only two cases, in which new 
letters (not coming from compression) are introduced to a rule: 
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changing an outer letter to an inner one (line [3] of Algorithm [4]) There are at most 4 new 
powers of a (all possibly in succinct form) that may be introduced in each invocation of Al- 
gorithm 3J While these powers of a are written in a succinct representation, they will be all 
replaced by single letters in the later appearance compression for a. 

In total, Algorithm U] is invoked for each outer letter, and there are at most 2n such letters, 
by Lemma [5] So there are at most 8n new symbols introduced in this way for one iteration 
of the main loop of Algorithm [TJ 

popping letters (line [3] of Algorithm [5]) Each invocation of of Algorithm^introduces at most 
2 new symbols. As Algorithm is also used for each outer letter, in this way at most 4n 
new symbols were introduced per each iteration of the main loop of Algorithm [TJ 

Hence, there are at most 12n new letters added to the right-hand side of a production in each 
iteration of the main loop in Algorithm [TJ Still, the main task performed by Algorithm [JJ is the 
compression: an argument similar as in the proof Lemma 0] can be used to show that the size of 
the explicit strings in the rules decreases by a factor of 3/4 in each iteration of loop from linc[TJ 
in Algorithm [TJ Of course, the newly introduced letters may be unaffected by this compression. 
It is left to verify, that 48n is indeed the upper bound on the size of the right-hand size of a rule, 
let it be Xi —> af. 

3 3 
la'l < - • led + 12n < - • 48n+ 12n < 48n. 
' " 4 4 

Which proves the bound on \G\ at the end of each iteration of the main loop. Notice, that as there 
at most 12n new letters added to this rule, inside the phase the size of G is at most 60n 2 . 

We now turn our attention to the size of E. Again, consider the execution of Algorithm [JJ 
and one iteration of the main loop. We show, that there are poly(n) letters introduced in one 
such iteration. New letters are introduced when compression of pairs or appearance compression 
is applied. There are the following possibilities 

compression of an inner letter (line [TJ of Algorithm [2]) Each compression of an inner let- 
ter decreases the total length of explicit strings used in G by at least 1. Since the size 
of each right-hand side is at most 48n at the beginning of the iteration and there are at 
most 12n new letters introduced to a rule in each iteration, there can be at most 60n 2 such 
compressions. 

compression of a non-crossing pair (line [4] of Algorithm [3]) The same argument as in the 
previous case applies. 

appearance compression for of an outer letter (Algorithm [4] followed by Algorithm [3]) 

There are at most 2n outer letters, by Lemma [2J and each of them has at most \G\ different 
lengths of non-extendible appearances, by Lemma 1121 Compressing all of them introduces 
at most 2n\G\ new letters to E. 

compression of a crossing pair (call to Algorithm [2] made by Algorithm [6]) The crossing 
pairs compression is run for each of the 2n outer letters. When such an outer letter is fixed, 
there are at most \G \ +3n different pairs of letters appearing in word(X n ), by Lemma [2J this 
may introduce up to \G\ +3n new letters. Hence, in total this might introduce up 2n'G| + 6n 2 
new letters to E. 

Thus, the size of |E| is poly(n). 

It is left to bound the size of \N\: as there are at most n non-terminal transitions, the the size 
of transition function of N is at most 0(\Qm\ 2 ■ (|E| + n)). So it enough to bound the number 
of states of N by poly(n). Recall, that for simplicity we assumed that all transitions in the input 
DFA are labelled with different SLPs, in particular, this polynomial bound holds for the input 
instance. 

There are only two situations, in which new states are introduced: 
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changing a from outer letter to inner one (Algorithm [4] in line [6]) Algorithm[3]introduces 
two states per nonterminal transition, and there are at most n such transitions, by (Aut^). 
So it is enough to estimate, how many times Algorithm [?] is invoked. Algorithm U is run for 
each outer letter, and there are at most 2n outer letters. Thus, in one iteration of the main 
loop of Algorithm [1] it adds at most 0(n 2 ) states in total in this case. 

popping letters (Algorithm [5] in line I10|> A similar argument applies, with the only differ- 
ence, that Algorithm [5] introduces one new state per nonterminal transition. 

It is left to recall, that by Lemma S] the main loop of Algorithm [1] is run 0(n) times. This ends 
the proof of the lemma. □ 
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